Skip to content

Conversation

RXamzin
Copy link

@RXamzin RXamzin commented Oct 6, 2025

After updating GO to v1.24+, a sharp increase in CPU utilization was detected. Heap profile helped to reveal increased memory allocations by Write and Close methods of stateless gzip.Writer mode. This PR optimizes problem area by using sync.Pool and later allocation of tokens object.

Benchmarks:

BEFORE

BenchmarkEncodeDigitsSL1e4-12              10141            115946 ns/op          86.25 MB/s      542379 B/op          3 allocs/op
BenchmarkEncodeDigitsSL1e5-12               1602            730674 ns/op         136.86 MB/s      541377 B/op          2 allocs/op
BenchmarkEncodeDigitsSL1e6-12                175           6851506 ns/op         145.95 MB/s      541542 B/op          2 allocs/op
BenchmarkEncodeTwainSL1e4-12                9708            131564 ns/op          76.01 MB/s      542146 B/op          3 allocs/op
BenchmarkEncodeTwainSL1e5-12                1663            684854 ns/op         146.02 MB/s      541463 B/op          2 allocs/op
BenchmarkEncodeTwainSL1e6-12                 177           6435648 ns/op         155.38 MB/s      541654 B/op          2 allocs/op

AFTER

BenchmarkEncodeDigitsSL1e4-12              34747             33800 ns/op         295.86 MB/s           8 B/op          0 allocs/op
BenchmarkEncodeDigitsSL1e5-12               1771            640723 ns/op         156.07 MB/s         160 B/op          0 allocs/op
BenchmarkEncodeDigitsSL1e6-12                181           6759226 ns/op         147.95 MB/s        1573 B/op          0 allocs/op
BenchmarkEncodeTwainSL1e4-12               35294             35304 ns/op         283.26 MB/s           8 B/op          0 allocs/op
BenchmarkEncodeTwainSL1e5-12                1939            585755 ns/op         170.72 MB/s         146 B/op          0 allocs/op
BenchmarkEncodeTwainSL1e6-12                 181           6505389 ns/op         153.72 MB/s        1573 B/op          0 allocs/op

Summary by CodeRabbit

  • Refactor
    • Optimized compression internals to reuse buffers via pooling, improving throughput and reducing memory use during repeated operations.
    • Enhances performance and consistency for both dictionary and non-dictionary compression paths across large blocks.
    • No changes to public APIs or user-facing behavior; workflows remain the same.
    • Users may see faster compression and lower memory footprint under sustained/high-volume workloads.

Copy link

coderabbitai bot commented Oct 6, 2025

📝 Walkthrough

Walkthrough

Introduces a pooled tokens object (tokensPool) in flate/stateless.go. Replaces a stack-allocated dst with pooled instances, ensuring Reset on reuse and deferred return to the pool. Updates statelessEnc calls to pass dst by value in both no-dict and with-dict paths, integrating pooling into the compression loop.

Changes

Cohort / File(s) Summary of Changes
Stateless tokens pooling and call adjustments
flate/stateless.go
Added internal tokensPool for reusing *tokens; replaced local dst with pooled instance, call Reset() on reuse and defer Put(dst); updated statelessEnc invocations to pass dst by value in both no-dict and dict branches; integrated pooling into the compression loop while keeping existing bitWriterPool usage.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as compressLoop
    participant Pool as tokensPool
    participant Enc as statelessEnc
    participant Dict as optional dict

    Caller->>Pool: Get() -> dst
    note right of Pool #e6f7ff: pooled *tokens returned
    Pool-->>Caller: dst
    Caller->>Caller: dst.Reset()

    alt no dict
        Caller->>Enc: statelessEnc(dst, src)
        Enc-->>Caller: compressed block produced
    else with dict
        Caller->>Dict: access dict
        Caller->>Enc: statelessEnc(dst, src, dict)
        Enc-->>Caller: compressed block produced
    end

    Caller->>Pool: Put(dst) (deferred)
    note right of Caller #f0ffe6: dst returned to pool after block processing
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “flate: objects allocation optimization” succinctly captures the primary change of reducing object allocations in the flate package by introducing pooling without extraneous details. It accurately reflects the intent to optimize allocations, is clear and concise, and will be meaningful to teammates reviewing history.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9479a36 and f9a02fb.

📒 Files selected for processing (1)
  • flate/stateless.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • flate/stateless.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flate/stateless.go (1)

64-69: Consider using any for consistency.

The tokensPool correctly follows the sync.Pool pattern established by bitWriterPool. However, for consistency with line 59, consider using any instead of interface{} as the return type.

Apply this diff:

 // tokensPool contains tokens struct objects that can be reused
 var tokensPool = sync.Pool{
-	New: func() interface{} {
+	New: func() any {
 		return &tokens{}
 	},
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d10b525 and 9479a36.

📒 Files selected for processing (1)
  • flate/stateless.go (4 hunks)
🔇 Additional comments (3)
flate/stateless.go (3)

100-104: LGTM! Proper sync.Pool usage pattern.

The pooling implementation correctly follows best practices:

  • Obtains a tokens instance from the pool
  • Resets it immediately to ensure clean state
  • Uses defer to guarantee return to the pool even on early returns or errors

This change aligns with the PR's goal of reducing allocations, as confirmed by the benchmark improvements showing dramatic allocation reductions (from ~542KB/op down to 8 B/op in some cases).


128-128: LGTM! Consistent usage of pooled dst.

The changes correctly adapt to dst being a pooled *tokens instance:

  • Lines 128, 130: statelessEnc calls pass dst directly (not &dst), matching the function signature at line 176
  • Line 144: writeBlockDynamic receives dst as *tokens
  • Line 150: dst.Reset() correctly prepares the pooled instance for reuse in the next loop iteration

The integration maintains correctness while enabling the allocation optimizations described in the PR.

Also applies to: 130-130, 144-144, 150-150


64-69: No other tokens allocations found. All pointer instantiations of tokens in production code occur via tokensPool; other occurrences are value declarations or in tests and don’t require pooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant